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Q Hi Abstract 

\ We study the problem of tracking an object moving through a network of wireless sensors. In 

order to conserve energy, the sensors may be put into a sleep mode with a timer that determines their 
sleep duration. It is assumed that an asleep sensor cannot be communicated with or woken up, and 
hence the sleep duration needs to be determined at the time the sensor goes to sleep based on all the 
information available to the sensor. Having sleeping sensors in the network could result in degraded 
tracking performance, therefore, there is a tradeoff between energy usage and tracking performance. We 
■ design sleeping policies that attempt to optimize this tradeoff and characterize their performance. As an 

extension to our previous work in this area QJJ, we consider generalized models for object movement, 



X 



object sensing, and tracking cost. For discrete state spaces and continuous Gaussian observations, we 



. derive a lower bound on the optimal energy-tracking tradeoff. It is shown that in the low tracking error 

■ regime, the generated policies approach the derived lower bound. 



I. Introduction 



Large sensor networks collecting data in dynamic environments are typically composed of a distributed 
collection of cheap nodes with limited energy and processing capabilities. Hence, it is imperative to 
efficiently manage the sensors' resources to prolong the lifetime of such networks without sacrificing 
performance. Our focus in this paper is on sensor resource management for tracking and surveillance 
applications. 
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Previous work on sensor resource management considered the design of sensor sleeping protocols for 
sensor sleeping via wakeup mechanisms 0-Q or by modifying power-save functions in MAC protocols 
for wireless ad hoc networks ll8l- |[T0l . In the context of target classification, Castanon [11] developed an 
approximate dynamic programming approach for dynamic scheduling of multi-mode sensors subject to 
sensors resource constraints. In lfT2ll . |fT3ll we studied a single object tracking problem where the sensors 
can be turned on or off at consecutive time steps to conserve energy (sensor scheduling). A controller 
selects the subset of sensors to activate at each time step. Also in 12, we studied a tracking problem 
where each sensor could enter a sleep mode with a sleep timer (sensor sleeping). While in sleep mode, 
the sensor could not assist in tracking the object by making observations. In contrast to lfT3l . in 0]] we 
assumed that sleeping sensors could not be woken up externally but instead had to set internal timers to 
determine the next time to come awake, wherefore, the control actions correspond to the sleep durations 
of awake sensors. In turn, this did not only entail a different control space, but also led to a significantly 
different policy design problem since a decision to put a sensor to sleep implies that this sensor cannot be 
scheduled at future time steps until it comes awake. The consequences of the current action on the tracking 
performance could be more dramatic rendering future planning more crucial. This led to a design problem 
that sought to optimize a tradeoff between energy efficiency and tracking performance. While optimal 
solutions to this problem could not be found, suboptimal solutions were devised that were demonstrated 
to be near optimal. To aid analysis, we assumed particularly simple models for object movement, object 
sensing, and tracking cost. In particular, we assumed that the network could be divided into cells, each 
of which contained a single sensor. The object moved among the cells and could only be observed by 
the sensor in the currently occupied cell. Tracking performance was a binary quantity; either the object 
was observed in a particular time slot or it was not observed depending on whether the right sensor was 
awake. 

In this paper, we continue to examine the fundamental theory of sleeping in sensor networks for 
tracking but we extend our analysis to more generalized models for object movement, object sensing, 
and tracking cost. We allow the number of possible object locations to be different from the number 
of sensors. The number of possible object locations can even be infinite to model the movement of an 
object on a continuum. Moreover, the object sensing model allows for an arbitrary distribution for the 
observations given the current object location, and the tracking cost is modeled via an arbitrary distance 
measure between the actual and estimated object location. 

Not surprisingly, this generalization results in a problem that is much more difficult to analyze. Our 
approach is to build on the policies designed in [1]. The design of those policies relied on the separation 
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of the problem into a set of simpler subproblems. In QTJ, we have shown that under an observable-after- 
control assumption, the design problem lends itself to a natural decomposition into simpler per-sensor 
subproblems due to the simplified nature of the tracking cost structure. Unfortunately, this does not extend 
to the generalized cases we consider herein. However, based on the intuition gained from the structure 
of the solution in the simplified case, in this work we artificially separate our problem into a set of 
simpler per-sensor subproblems. The parameters of these subproblems are not known a priori due to 
the difficulties in analysis. However, we use Monte Carlo simulation and learning algorithms to compute 
these parameters. We characterize the performance of the resulting sleeping policies through simulation. 
For the special case of a discrete state space with continuous Gaussian observations, we derive a lower 
bound on the optimal energy-tracking tradeoff which is shown to be loose at the high tracking error 
regime, but is reasonably tight for the low tracking error region. 

The remainder of this paper is organized as follows. In Section [ill we describe the tracking problem 
in mathematical terms and define the optimization problem. In Section [III] we derive our suboptimal 
solutions and the aforementioned lower bound. In Section JVJ we provide numerical results that illustrate 
the efficacy of the proposed sleeping policies. We summarize and conclude in Section [V] 



A. POMDP Formulation 

Consider a network with n sensors. Each sensor can be in one of two states: awake or asleep. A 
sensor in the awake state consumes more energy than one in the asleep state. However, object sensing 
can be performed only in the awake state. We denote the set of possible object locations as B such that 
\B\ = m + 1 where the [m + l)-th state represents an absorbing terminal state that occurs when the 
object leaves the network. We also refer to this terminal state as T ■ If B is not a finite set then m is 
oo. We define a kernel P such that P(x, y) is the probability that the next object location is in the set 
y C B given that the current object location is x. We can predict t time steps into the future by defining 
P 1 = P and P* inductively as 



Suppose p is a probability measure on B such that p(X) for X G B is the probability that the state is 
in X at the current time step. Then the probability that the state will be in y after t time steps in the 
future is given by 



II. Problem Formulation 




(1) 




(2) 
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This defines the measure pP l which depends on both the prior p and the transition Kernel P. Let bk 
denote the state for the object at time k. Also, let 5 X denote a probability measure such that 5 X (A) = 1 
if x € A, and 5 X (A) = otherwise. Conditioned on the object state b^, the future state bk+\ has a 
distribution 5b k P- This defines the evolution of the object location. For a discrete state space this is 
simply the probability mass function defined by the b^-th row of a transition matrix P. We assume that 
it is always possible to determine if the object has left the network, i.e., if bk = m + 1. To this end, we 
define a virtual sensor n + 1 that detects without error whether the object has left the network. In other 
words, sensor n + 1 is always awake but consumes no energy. 

To provide a means for centralized control, we assume the presence of an extra node called the central 
controller. The central controller keeps track of the state of the network and assigns sleep times to sensors 
that are awake. In particular, each sensor that wakes up remains awake for one time unit during which 
the following actions are taken: (i) the sensor sends its observation of the object to the central unit, and 
(ii) the sensor receives a new sleep time (which may equal zero) from the central controller. The sleep 
time input is used to initialize a timer at the sensor that is decremented by one time unit each time step. 
When this timer expires, the sensor wakes up. Since we assume that wakeup signals are impractical, this 
timer expiration is the only mechanism for waking a sensor. 

Let rfe i denote the value of the sleep timer of sensor £ at time k. We call the (n + l)-vector the 
residual sleep times of the sensors at time k. Also, let Uk,e denote the sleep time input supplied to sensor 
I at time k. We add the constraints rk, n +l = and Uk : - n +i = due to the nature of the virtual sensor 
n + 1. We can describe the evolution of the residual sleep times as 

rk+i,i = ( r k,£ ~ l)l{ r M > °} + u k /Hr k/ = 0} (3) 

for all k and £e{l,...,n + l}. The first term on the right hand side of this equation expresses that if 
the sensor is currently asleep (the sleep timer for the sensor is not zero), the sleep timer is decremented 
by 1. The second term expresses that if the sensor is currently awake (the sleep timer is zero), the sleep 
timer is reset to the current sleep time input for that sensor. 

Based on the probabilistic evolution of the object location and ([3]), we see that we have a discrete-time 
dynamical model that describes our system with a well-defined state evolution. The state of the system 
at time k is described by x k = {bk,fk)- Unfortunately, not all of x k is known to the central unit at time 
k since b k is known only if the object location is being tracked precisely. Thus we have a dynamical 
system with incomplete (or partially observed) state information. 
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We write the observations for our problem as 

Zk = (s k ,r k ) (4) 

where s k is an (n + l)-vector of observations. These observations are drawn from a probability measure 
a Xk that depends on x k . However, we add two restrictions. The first is that if a sensor is not awake at 
time k, its observation is an erasure. Mathematically, we say that r k ,t > implies s kl e = £• The second 
restriction is that Sk, n +i is a binary observation that indicates whether the object has left the network. 
The total information available to the control unit at time k is given by 

Ik = {zo,---,Zk,u ,...,u k -i) (5) 

with Iq = zq denoting the initial (known) state of the system. The control input for sensor £ at time k is 
allowed to be a function of the information state Ik, i.e., 

u k = /J-k(h) (6) 

The vector-valued function ^ is the sleeping policy at time k which defines a mapping from the 
information state Ik to the set of admissible actions u k . 

We now identify the costs present in our tracking problem. The first is an energy cost of c > for 
each sensor that is awake. The energy cost can be written mathematically as 

n 

^cl{r M = 0} (7) 

e=i 

The second cost is a tracking cost. To define the tracking cost, we first define the estimated object location 
at time k to be b\- We can think of b\ as an additional control input that is a function of I k , i.e., 

b k = Pk{h) (8) 

Since b k does not affect the state evolution, we do not need past values of this control input in I k . The 
tracking cost is a distance measure that is a function of the actual and estimated object locations and is 
written as 

d(b k ,b k ) (9) 

We assume that d is a bounded function on B x £>. Two examples of distance measures we might employ 
are the Hamming cost (if the space B is finite), i.e., 

d(b kl h) = Hh + b k } (10) 
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and the squared Euclidean distance (if the space B is a subset of an appropriate vector space), i.e., 

d(bk,h) = \\h ~ 6fe||l (11) 



The parameter c is used to trade off energy consumption and tracking errors. 

Recall that the input b k does not affect the state evolution; it only affects the cost. Therefore, we can 
compute the optimal choice of b\, given by j3^{I k ), using an optimization minimizing the tracking error 
over a single time step. We can thus write 



arg min E 

6 



d(b k ,b k 



(12) 



Remembering that once the terminal state is reached no further cost is incurred, we can write the total 
cost for time step k as 



g(b k ,I k ) = \{b k ± T} I +J^cl{r M = 0} 

V 1=1 j 

The infinite horizon cost for the system is given by 



k=l 



(13) 



(14) 



Since g is bounded (since the function d is bounded) and the expected time till the object leaves the 
network is finite, the cost function J is well defined. The goal is to compute the solution to 



min J{Iq,hq,hi,...) 



Mo, Mi, 



(15) 



The solution to this optimization problem for each value of c yields an optimal sleeping policy. The 
optimization problem falls under the framework of a partially observable Markov decision process 
(POMDP) lfT4l-lfr71. 



B. Dealing With Partial Observability 

Partial observability presents a problem since the information for decision-making at time k given 
in ® is unbounded in memory. To remedy this, we seek a sufficient statistic for optimization that is 
bounded in memory. The observation s k depends only on x k , which in turn depends only on x^-i, 
and some random disturbance w^-i- It is a standard argument (e.g., see [18]) that for such an 
observation model, a sufficient statistic is given by the probability distribution of the state x k given I k . 
Such a sufficient statistic is referred to as a belief state in the POMDP literature (e.g., see |[T4l . |fl31 ). 
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Since the residual sleep times portion of our state is observable, the sufficient statistic can be written as 
Vk = (Pk,fk), where pk is a probability measure on B. Mathematically, we have 

Pk(X) = P(h G X\h) (16) 

The task of recursively computing pk for each A; is a problem in nonlinear filtering (e.g., see H9]). In 
other words, Pk+i can be computed using standard Bayesian techniques as the posterior measure resulting 
from prior measure pP and observations Sfc+i. 

The function that determines b\ can now be written in terms of pk and instead of Ik- We can 
rewrite it as 



Pk(Pk,r k ) = argminE d(b k ,b)\b k ~ p k 
b 1 



(17) 



= argmin/ d(b k ,b)p k {db) (18) 
h Jb 

Note that due to the stationarity of the state evolution, /3| has the same form for every k and is independent 
of Vk- Thus, we can drop the subscript and refer to /3| as /3*, a function of alone. 

Now we write our dynamic programming problem in terms of the sufficient statistic. We first rewrite 
the cost at time step k. Since only expected values of the cost function g appear in (fl4l) . we can take our 
cost function to be the expected value of g (defined in (fT3l) ) conditioned on 6& being distributed according 
to pk- With a slight abuse of notation, we call this redefined cost g. The cost can then be written as 

9(Pk,r k ) = jT 1{6 + T} (d(b,P*(p k )) + f^cl{r k/ = 0}^J p k (db) (19) 
= J ld(b,^(pk)) + ^cl{r k/ = 0})pk(db) 



(20) 



'B-T 

The selection of sleep times, originally presented in ©, can now be rewritten as 



Uk = ^k(Pk,rk) (21) 

The total cost defined in (fT4l ) becomes 

J(po,ro,Ho,fJii, ■■■) = 
and the optimal cost defined in ( fT5T ) becomes 



^2g(Pk,r k/ 

k=l 



(22) 



J*(.Po,r ) = min J(p ,r , /i , Hi, ■ ■ ■) (23) 
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III. Suboptimal Solutions 
Similar to the problem in HI, an optimal policy could be found by solving the Bellman equation 

J(jp,r) = minE[gf(pi,ri) + J(pi,ri)|p = P,r = r,u = fj,(p ,r )] (24) 

A 4 

However, since an optimal solution could not be found for the simpler problem considered in (TJ, we 
immediately turn our attention to finding suboptimal solutions to our problem. 

Note that in (T), simpler sensing models and cost structures were employed. Under a simplifying 
observable-after-control assumption, the simplicity of the sensing models allowed for the decoupling 
of the contributions of the individual sensors. The simplicity of the cost structures allowed the cost to 
be written as a sum of per-sensor costs. The result was a problem that could be written as a number 
of simpler subproblems. The present case is more complicated. In general, the cooperation among the 
sensors may be difficult to analyze and understand. Furthermore, the tracking cost may not be easily 
written as a sum across the sensors. 

Based on the intuition gained from HI, our approach to generating suboptimal solutions is to artificially 
write the problem as a set of subproblems that can be solved using the techniques of |T]. The tracking 
cost expressions (which are a function of the sleeping actions of the sensors) in these subproblems will 
be left as unknowns. To determine appropriate values for these tracking costs, we either perform Monte 
Carlo simulations before tracking begins or use data gathered during tracking. The intuition is that if the 
resultant tracking cost expressions capture the "typical" behavior of the actual tracking cost, then our 
sleeping policies should perform well. 

A. General approach 

The complexity of the sleeping problem stems from: 

1) The complicated evolution of the belief state p^ (non-linear filtering). 

2) The complexity of the model including the dimensionality of the state space, the control space and 
the observation space. 

To address the aforementioned difficulties, our approach has two main ingredients. First, we make 
assumptions about the observations that will be available to the controller at future time steps. To generate 
sleeping policies, we assume that the system is either perfectly observable or totally unobservable after 
control. Hence, we define approximate recursions with special structure as surrogates for the optimal 
value function. Second, we devise different methodologies to evaluate suitable tracking costs in Sections 
IIII-BI and IIII-CI whereby we capture the effect of each sensor on the overall tracking cost. Writing the 
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combined tracking cost as the sum of independent contributions of different sensors (with respect to some 
baseline) allows us to write the Bellman equation as the sum of per-sensor recursions. Instead of solving 
the Bellman equation in (l24l ). we alternatively solve n simpler Bellman equations to find per-sensor 
policies and cost functions. The overall policy is then the per-sensor policies applied in parallel. 

We denote by jw the cost function of the £-th sensor approximate subproblem. We define T A (b, £) 
to be the increase in tracking cost due to not waking up sensor £ at time k given that bf.-i = b. This 
is meant to capture the contribution of the £-th sensor to the total tracking cost. Next we define our 
approximations. 

1) Qmdp : Fkst introduced in the artificial intelligence literature |[20l . 11211 . the Q MDP solution for 
POMDPs assumes that the system will be perfectly observable after control, i.e., the partially observable 
state becomes fully observable after taking a control action. In other words, under a Q MDP assumption 
the belief state simply evolves as 

Pk+i = Sb h+1 (25) 

Noting that the future cost is not only affected by the current control action through belief evolution, 
but also by the fact that no future decisions can be made for a sleeping sensor until it wakes up, the 
observable-after-control policy is by no means a myopic policy. Note that ( [251 does not imply zero 
tracking errors; it is merely an assumption simplifying the state evolution in order to generate a sleeping 
policy. Now we can readily define a Q MDP per-sensor Bellman equation analogous to the one in Q] as 

jW(p) = nun (^2 f B r TA (b,£) ( P P j )(db) + jf ^ (c + J®(8 b j) (pP u+1 )(db)\ (26) 

To clarify, the first summation in the R.H.S. of d26l corresponds to the expected tracking cost incurred 
by the sleep duration u of sensor I. The second term consists of: (i) the energy cost incurred as the 
sensor comes awake after its sleep timer expires (after u + l time slots); and (ii) the cost to go under an 
observable-after-control assumption (hence the belief state is 5b). 

We cannot find an analytical solution for (l26l . However, note that if we can solve (|26*1) for p = 5 b for 
all b, then it is straightforward to find the solution for all values of p. Thus, given a function T A , @ 
can be solved through standard policy iteration [18"], but only if B is finite. 

2) FCR: Similarly, we define a First Cost Reduction (FCR) Bellman equation analogous to the one 
in ID as 

J^(p) = min (y f T A (b,£) (pP j )(db) + c [ (pP u+1 )(db) + J^{pP u+1 ) | (27) 
u \t^Jb-t Jb-t I 
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In this case, it is assumed that we will have no future observations. In other words, we define the 
belief evolution as pu+i = PkP- Again, it is worth mentioning that this does not mean that it would be 
impossible to track the object; we are simply making a simplifying assumption about the future state 
evolution in order to generate a sleeping policy. Given a function T A , it is easy to verify that the solution 
to ([27]) is 



In other words, the policy is to come awake at the first time the expected tracking cost exceeds the 
expected energy cost where the tracking cost is defined based on T A (to be determined) hence the name 
First Cost Reduction. 

The solutions to the per-sensor Bellman equations in d26l ) and d27T ) define the Q MDP and FCR policies 
for each sensor, respectively. Note that, unlike (T], H3, lfT3l , the solution to the Q MDP recursion does 
not necessarily provide a lower bound on the optimal value function since the employed tracking cost 
is not a lower bound on the actual tracking cost. In Sec IIII-DI we derive a lower bound on the optimal 
energy-tracking tradeoff for discrete state spaces with Gaussian Observations. The remaining task is to 
identify appropriate values of T^(b,£) for all b ^ T and for all t. This is the subject of the next two 
sections. 

B. Nonlearning approach 

For now, suppose that B is a finite space. Suppose bk-\ = b. To generate T A (b,£) for a particular 
I, we first assume a "baseline" behavior for the sensors, i.e., we make an assumption about the set of 
sensors that are awake at time k given that bk-i = b. We consider two possibilities: 

1) That all sensors are asleep. 

2) That the set of sensors awake is selected through a greedy algorithm. In other words, the sensor 
that causes the largest decrease in expected tracking cost is added to the awake set until any further 
reduction due to a single sensor is less than c. The expected tracking cost can be evaluated through 
the use of Monte Carlo simulation (repeatedly simulating our system from time k — 1 to time k) 
to avoid the need for numerical integration. 




(28) 



and the associated policy is to choose the first value of u such that 




(29) 
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Starting with this set of awake sensors, the value of T A (b,£) is then computed as the absolute difference 
in expected tracking cost incurred by changing the state of sensor £. Again, Monte Carlo simulation can 
be used to evaluate the change in expected tracking cost. We can think of this procedure as linearizing 
the tracking cost about some baseline behavior. 

If B is not finite, then a parameterized version of T A can be computed instead. We choose rh elements 
of B — T and evaluate T A at these points. The value of T A at all other values of b € B — T can be 
computed via an interpolation algorithm. Recall that only an FCR policy is appropriate in the infinite 
state case, since solving the Q MDP Bellman equation for an infinite number of point mass distributions 
is infeasible. 

C. Learning approach 

In this section, we describe an alternative learning-based approach. For ease of exposition, suppose that 
B is a finite space. Then our probability measure p k can be characterized by a probability mass function. 
We refer to this probability mass function as p k (a row vector). Define a k ^ to be the approximated 
expected increase in tracking cost due to sensor £ sleeping at time k as 

&kj = 52Pk-iQ>)T A (b,e) (30) 

Ideally, we would like this approximation to be equal to the actual expected increase in tracking cost due 
to sensor I sleeping. Unfortunately, we do not have access to actual tracking costs at time k since bk is 
not known exactly. However, we do have access to p k , r k , and p^-i- ^ * s therefore possible to estimate 
the tracking cost as 

/ d(b,/3*(p k ))p k (db) (31) 

JB 

For example, if Hamming cost is being used, then we can estimate the tracking cost as 

1 - maxp fc ({6}) (32) 

b 

and if squared Euclidean distance is being used we can estimate the tracking cost using the variance of 
the measure p k . Next we describe how we learn T A by solving a least squares problem. 

Determining an estimate of the increase in the tracking cost due to the sleeping of sensor £ at time 
k, denoted au/, depends on the value of r k /. If = 0, we ignore the observation from sensor I and 
generate a new version of p k called p' k . We can compute a k ^ as 

a k ,e = J>' fe (W, W*)) - 5> fc (6)d(M*(p fc )) 03) 
b+T b^T 
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If on the other hand r& ^ > 0, we we first generate an object location b' k according to p k and then 
generate an observation according to the probability measure ay . This observation is used to generate a 
new distribution p' k from p k . Then we compute a k ^ as 

a k ,i = J> fc (&)d(&,0*(p*)) - Y,P'k(b)d(b,P*(p' k )) (34) 

We now have an approximation sequence a^g and an observation sequence a& g. At time k — 1, our 
goal is to choose T A to minimize 

E [(afc,£ - a ktf ) 2 ] (35) 

We apply the Robbins-Monro algorithm, a form of stochastic gradient descent, to this problem in order 
to recursively compute a sequence of T A that will hopefully solve this minimization problem for large 
k. The update equation is 

T A (M) = T^ibJ) - 2a k l{b + T}p k ^(b)(a k ,e - a k/ ) (36) 

where a k is a step size. Note that 1{b ^ T}p k ~i(b) is the gradient of a k / with respect to T A (6, £). 

Using a constant step size in our simulations, we could only observe small oscillations in the values 
of T A . It is unclear whether there are conditions under which the local or global convergence of this 
learning algorithm is guaranteed. The difficulty is that the observations we are trying to model depend 
on the model itself. The problem is reminiscent of optimistic policy iteration (see lPT8ll ), the convergence 
properties of which are little understood. We have left a proof of convergence for future work. It should 
be pointed out that the algorithm will likely converge more slowly for a two-dimensional network than a 
one-dimensional network. The reason is that in two dimensions it is easier for an object to avoid visiting 
an object location state and causing an update to that particular value of T A . 

If B is not finite, then we can again parameterize T A as in the previous section. The Robbins-Monro 
algorithm can be applied in this context as well, although the gradient expressions will depend on the 
type of interpolation used. 

D. A Lower Bound 

Unfortunately, deriving a lower bound is generally difficult for the considered problem. However, in this 
section we derive a lower bound for the special case of a discrete state space with Gaussian observations. 
Our approach is similar to lfl"3l in which we considered a related scheduling problem. The idea is to 
combine the observable-after-control assumption with a separable lower bound on the tracking cost as 
we demonstrate in what follows. 
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Given the current belief p k , an action vector u k , and the current residual sleep times vector r k , the 
expected tracking cost can be written as: 

m 

E[d(b k+ i,b k+ i)\p k ,u k ,r k ] = ^Pr[6 fc+ i ^ j\p k , u k , r k , b k+l = j]Pr[6 fc +i = j\p k ,u k ] 

m m 

= J^Pki^^Pih+i =j\h = «)Pr[fofc+i + j\p k ,u k ,r k ,b k+ i =j] 
i=i j=i 

(37) 



»W~JV|7 !Vrr»ll (38) 



When awake, the sensors observations are Gaussian, i.e., 

10 

(u e -b k y + l 
where Uf is the location of sensor I. 
Defining, 

P{E\Hj) = ~Pr[h+i + j\p k ,u k ,r k , b k+1 = j] 

which is a conditional error probability for a multiple hypothesis testing problem with m hypotheses, 
each corresponding to a different mean vector contaminated with white Gaussian noise. Conditioned on 
Hj, the observation model is: 

Hj : s(£) = (m 3 (£) + w)l{r k+1/ = 0} + e\{r k+l/ > 0} (39) 

where s(£) is the £-th entry of an n x 1 vector s denoting the received signal strength at the n sensors, 
nij is the mean received signal strength when the target is at state j (j-th hypothesis) and w is a zero 
mean white Gaussian Noise, i.e. w ~ M(0, a 2 ). According to d39l , if awake at the next time step, sensor 
£ gets a Gaussian observation that depends on the future target location, and an erasure, otherwise. Since 
the current belief is p k , the prior for the j-th hypothesis is ttj = [p k P]j. 
The error event E can be written as the union of pairwise error regions as 

p(E\Hj) = Pr[U fe ^0y] (40) 

where 

C kj = {s : L kj (s) > — } 
n k 

is the region of observations for which the k-th hypothesis H k is more likely than the j-th hypothesis 
Hj, and where 

A f(s\H k ) 
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denotes the likelihood ratio for H k and Hj. 

Using standard analysis for likelihood ratio tests 11221 . 11231 . it is not hard to show that: 

pfoM) = Q ^ + ^ (41) 

where d| • = Amfc ^ mfcJ , Am^ = — m,j, and Q(.) is the normal distribution Q-function. The 
quantity d k j plays the role of distance between the two hypothesis and hence depends on the difference 
of their corresponding mean vectors and the noise variance a 2 . Hence, d k j is a function of the next step 
residual sleep vector r k+ i. To highlight this dependence, we will sometimes use the notation d k j(r) 
when needed. Note that, for different values of k and j, are not generally disjoint but allow us to 
lower bound the error probability in terms of pairwise error probabilities, namely, a lower bound can be 
written as: 

p{E\Hj) > max P (( kj \Hj) (42) 
And we can readily lower bound the expected tracking error: 

m m 

E[d(b k+1 ,b k+1 )\p k ,u k ] > ^p k (i)^p(b k+1 = j\b k = i)maxp(( kj \H j ) 

i=i j=i 

mm / , 7Tj_ \ 

= J2Pk(i)J2p(h + i=j\b k = i)ma,xQ -M + ) (43) 

i=i j=i ^ J V k '-> ) 

Next we separate out the effect of each sensor on the tracking error: 

E[d(b k+1 ,b k+1 )\p k ,u k ,r k ] > l{r k+1/ = 0}E[d(b k+1 , b k+1 )\p k , r k+1 = 0] 

+ H r k+i,e > 0}E[d(b k+1 ,b k+1 )\p k ,r k+1>i = Vi ± £} for every t 

(44) 

where is the all zero vector designating that all sensors will be awake at the next time slot. The 
inequality in (a) follows from the fact that if we separate out the effect of the £-th sensor we get a better 
tracking performance when all the remaining sensors are awake. Since this holds for every I, a lower 
bound on the expected tracking error can be written as a convex combination of all sensors contributions: 

n 

E[d(b k+1 ,b k+1 )\p k ,u k ,r k ] > ^ \i{p k ){l{r k+1/ = 0}E[d(b k+1 ,b k+1 )\p k ,r k+1 = 0] 



+ H r k+i,£ > 0}E[d(b k+1 ,b k+1 )\p k ,r k+1)i =0 Vi/ £]\ 

(45) 



where Y^e^tiPk) = 1- 
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Let £ denote a vector of length n with all entries equal to zero except for the ^-th entry which can 
be anything greater than 0. Then replacing from (l43l) . 

E[d(b k+1 ,b k+1 )\p k ,u k ,r k ] > 

n ( mm / ^ ,qn J n Hi \ 

^^(p^l^+M^J^PfcWj^K^+i =j\h = i)m^Q y-^— + ^5y J 



m 



dkj(0-i) , 

naA y y i H - ~~ ~ : r~ 

k# * \ 2 T d fcj (0_ £ ) 



+l{r fc+v > 0}^p fe (i)J^p(6 fc+ i =j|6 fc = i)maxQ j —2-^ + 

i=l j=i 

To simplify notation, we introduce the following 2 quantities: 



(46) 



7o(p; i, 4 £>(& fc+1 = j|6 fc = max Q ^ + 



T(p;i,£) = ^p(b k+1 = j\b k = i)maxQ ( - 

3=1 \ 



d kj {Q_ t 

Intuitively, To(p;i,£) represents the contribution of sensor £ to the total expected tracking cost when the 
underlying state is i, the belief is p and when all sensors are awake. On the other hand T(p; i, £) is the 
£-th sensor contribution when it is asleep and all the other sensors are awake. 

Now if we assume that the target will be perfectly observable after taking the sleeping action, a lower 
bound on the total cost can be obtained from the solution of the following Bellman equation: 

J(p,r ) = J2j £ (P,r ,e) (47) 

i 

where, 

f I m m \ 

J £ (p,r ,e) = mm <j l{r v = 0} ( ^p(6)A f T (p; b,£) + c^IpP], + J>P]; J(e h 0) ) 



i=i i=i 

m 



+ UK, > 0} \^ P (b)\ e T( P] bj) + Y2\P p }i J ( e i> u e)j | (48) 

Note that if we can solve the equation above for p = ei for all i G {1, . . . , m}, then it is straightforward 
to find the solution for all other values of p. We therefore focus on specifying the value function at those 
points. Since this is the case, we further simplify our notation and use T(i,£) and \(i,£) as shorthand for 
T(ef,i,£) and A^(e^), respectively. Also since an action only needs to be made when the sensor wakes 
up, we only need to define actions at r 0j £ = 0. Observing that 

m 

f(e 3 ,u) = X(j,£)T(j,£) + J2l^ p ]iAei,u- 1) Vol (49) 

i=i 
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and 

m m 

J e (ej,l) = X(j,£)T (j,i) + c^iejP^ + Y / [e j P] i J e (e l ,0) (50) 

i=i i=i 

we recursively substitute from d49l ) and (l50l) in (l48l) until the system reaches (ej,0). We can see that a 
lower bound on the value function of sensor £ can be obtained as a solution of the following minimization 
problem over u\, where u\ is the control action for sensor £ given a belief state e b 

( u—l m m 

J e (e b ) = mm\ ^^[e b P%\(i,£)T{i,£) + ^[e b P u ]i\(i,£)T {i,£) 



j=0 i=l i=l 

mm \ 

+ cY / [e b P u+1 ] i + J2[e b P u+1 } l J e (e l ) I (51) 

i=l i=l ) 

Equation ( TSTT ) together with d4Tb define a lower bound on the total expected cost. To further tighten 
the bound we can now optimize over a matrix A for every value of c, where A(c) is an m x n matrix 
with the (i,£) entry equal to X(i,£), i.e., A(c) = {X(i,£)}. Hence, 

n ( u—l m m 

J(e b ) = max V mini yy^[e h P%\{i,l)T(i,£) + J2[e b P u }i\(i,£)T (i,£) 

mm \ 

+ cY J HP u+ \ + ^[e 6 P u+1 ], J £ (e,) (52) 

i=l i=l J 

subject to Al n = l m 

where l m is a column vector of all ones of length m. A closed form solution for d52l cannot be obtained, 
and hence, we solve for J(e b ) numerically. First, we fix A and use policy iteration |[T8l to solve for 
the control of each sensor at each state. Then, we change A and repeat the process. The envelope of 
the generated value functions (corresponding to different instants of A) is hence a lower bound on the 
optimal value function. 

IV. Numerical Results 

In this section, we show some simulation results illustrating the performance of the policies we derived 
in previous sections. These results will be for one-dimensional sensor networks, but the general behavior 
should extend to two-dimensional networks. In each simulation run, the object was initially placed at the 
center of the network and the location of the object was made known to each sensor. A simulation run 
concluded when the object left the network. The results of many simulation runs were then averaged to 
compute an average tracking cost and an average energy cost. To allow for easier interpretation of our 
results, we then normalized our costs by dividing by the expected time the object spends in the network. 
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We refer to these normalized costs as costs per unit time, even though the true costs per unit time would 
use the actual times the object spent in the network (the difference between the two was found to be 
small). 

For the non-learning policies, the value of T A (b,£) for each b and I was generated using 200 Monte 
Carlo simulations. The results of 50 simulation runs were averaged when plotting the curves. For the 
learning policies, the values for T A were initialized to those obtained from the non-learning approach 
using greedy sensor selection as a baseline. A constant step size of 0.01 was used in the learning algorithm. 
First, 100 simulation runs were performed but the results were not recorded while the values for T A 
stabilized. Then an additional 50 simulation runs were performed (T A continued to be updated) and these 
results were averaged when plotting curves. In the case of Q M dp learning policies, computation time was 
saved by performing policy iteration only after every fifth simulation run. 

We first consider a simple network that we term Network A. This is a one-dimensional network with 
41 possible object locations where the object moves with equal probability either one to the left or one to 
the right in each time step. There is a sensor at each of the 41 object locations that makes (when awake) 
a binary observation that determines without error whether the object is at that location. Hamming cost 
is used for the tracking cost. 

For Network A, we illustrate the performance of the Q MDP versions of our policies in Figure [TJa) and 
the FCR versions of our policies in Figure [TJb). 

The curves labeled "Asleep" are for the nonlearning approach for computing T A where we assume that 
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all sensors are asleep as a baseline. The curves labeled "Greedy" are for the nonlearning approach for 
computing T A where we use a greedy algorithm to determine our baseline. The curves labeled "Learning" 
employ our learning algorithm for computing T A . 

From the tradeoff curves, it is apparent that using the learning algorithm to compute T A results in 
improved performance. A close inspection of Figures (Ha) and [2b) will reveal that the Q MDP policies 
perform somewhat better than their FCR counterparts. This is consistent with what was observed in 12. 

It is instructive to consider the final matrix of values for T A (b,£) that was obtained at the end of 
all learning algorithm simulations. In Figures |2] and [3] we plot this matrix for the Q MDP learning policy 
simulations for the smallest c and for the largest c used in simulation, respectively. In Figure 12 it is 
evident that only a single sensor has an impact for each value of b. Due to the way our simulations 
worked, it is the sensor to the left that has the impact, but it could just as easily be the sensor to the 
right of the current object position. The fact that most of the nonzero values of the matrix are less than 
0.5 reflects the fact that the sensor to the right of the current object location might wake up due to a 
sleep time selected at a previous time step. In Figure |3j it is evident that the sensors on either side of 
the current object location (which is actually not known since Figure [3] corresponds to the case where no 
sensors are awake) appear to have a major impact on the tracking cost. There are nonzero values off the 
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10 20 30 40 

Sensor 



Fig. 3. The final matrix for T A for the Q MDP learning policy and large c for Network A. 



TABLE I 

Object movement for Network B. 



Change in Position 





1 


2 


3 


Probability 


0.3125 


0.2344 


0.0938 


0.0156 



two main diagonals due to probabilistic nature of the learning process when the actual object location is 
not known. 

We now consider a new one-dimensional network termed Network B. The possible object locations 
are located on the integers from 1 to 21. The object moves according to a random walk anywhere from 
three steps to the left to three steps to the right in each time step. The distribution of these movements 
is given in table U The change in position indicate movement by a corresponding number of steps to the 
right or to the left. There are 10 sensors in this network so that m ^ n. The locations of the sensors are 
given in Table [TT] and awake sensors make Gaussian observations as in d38l ). 

Results for the Q MDP and FCR versions of our policies are shown in Figures Ufa) andHfb), respectively. 
The results confirm the same general trends observed for Network A. The figures also show our derived 
lower bound on the energy-tracking tradeoff using the approach described in Sec. IIII-DI Not surprisingly, 
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TABLE II 

Sensor locations for Network B. 



Sensor 


1 


2 


3 


4 


5 


6 


7 


8 


9 


10 


Location 


1.36 


1.61 


3.91 


8.09 


11.96 


13.39 


13.52 


13.66 


16.60 


18.68 




the lower bound is particularly loose at the high tracking cost regime, yet the gap is reasonably small 
for the low tracking error region. This is expected since the lower bound uses an all-awake assumption 
to lower bound the contribution of each sensor to the tracking error. However, it is worth mentioning 
that we can exactly compute the saturation point for the optimal scheduling policy, which matches the 
saturation limit of the shown curves, since every policy has to eventually meet the all-asleep performance 
curve when the energy cost per sensor is high. At that point, all sensors are put to sleep and hence the 
target estimate can only be based on prior information. The small gap at the low tracking error regime 
combined with the aforementioned saturation effect highlight good performance for our sleeping policies. 
For illustration, we plot the matrix for T A for the Q MDP learning policy simulations for the smallest c 
and for the largest c when the object moves according to a symmetric random walk in Figures [5] and 
[6l respectively. Note the difference between the rows corresponding to object locations 7 and 8 in 
Figure [5] Examining the sensor locations, we see that sensor 4 is located at 8.09. This sensor is useful 
for distinguishing between object locations 6 and 8 (for an initial object position of 7) but is of less value 
for distinguishing between object locations 7 and 9 (for an initial object position of 8). This is evidenced 



DRAFT 



21 



§ 8 

§ 10 



o 12 

CD 

O 14 



16 
18 

20 



4 6 
Sensor 



10 



0.4 



0.35 



0.3 



Fig. 5. The final matrix for T for the Q MDP learning policy and small c for Network B. 
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Fig. 7. Tradeoff curves for FCR policies for Network C. 



in the figure as a large value for T A (7, 4) and a small value for T A (8,4). 

To demonstrate that our techniques can be applied to an object that moves on a continuum, we define 
a new network, Network C. This network is identical to Network B except for two changes. First, the 
object can take locations anywhere on the interval [1, 21]. Second, the object moves according to Brownian 
motion with the change in position between time steps having a Gaussian distribution with mean zero and 
variance 1. As mentioned earlier, only FCR policies can be generated for this type of network. Values 
of T A were computed for each integer- valued object location on [1, 21] and linear interpolation used to 
compute values of T A for other object locations. Since continuous distributions cannot be easily stored, 
particle filtering techniques were employed (e.g., see lfl9ll ). The number of particles used was 512 and 
resampling was performed at each time step. As is consistent with particle filtering, in generating the 
sleep times the computation of future probability distributions was approximated through Monte Carlo 
movement of the particles. The number of simulation runs that were averaged for each data point was 
increased to 200 for these simulations. 

Tradeoff curves for Network C are shown in Figure |7J Although the tradeoff curves are less smooth 
than before, this figure illustrates performance trends similar to those already seen. The reason the curves 
are not as smooth is that occasionally the particle filter would fail to keep track of the distribution with 
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sufficient accuracy. This would cause the network to lose track of the object and cause abnormally bad 
tracking for that simulation run. These outliers were not removed when generating the tradeoff curves. 
A recovery mechanism would need to be added to the sleeping policies to overcome this limitation of 
particle filters. 

V. Conclusion 

In this paper, we considered energy-efficient tracking of an object moving through a network of wireless 
sensors. While an optimal solution could not be found, it was possible to design suboptimal, yet efficient, 
sleeping solutions for general motion, sensing, and cost models. We proposed Q MDP and FCR approximate 
policies, where in the former, the system is assumed to be perfectly observable after control, and in 
the latter, to be totally unobservable. We combined these approximations with a decomposition of the 
optimization problem into simpler per-sensor subproblems, and developed learning and non-learning 
based approaches to compute the parameters of each subproblem. The learning-based Q MDP policies 
were shown to provide the best energy-tracking tradeoff. In the low tracking error regime, our sleeping 
policies approach a derived lower bound on the optimal energy-tracking tradeoff. 

Avenues for future research include developing distributed sleeping strategies in the absence of central 
control and solving the tracking problem for unknown or partially known object movement statistics. 
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